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Abstract. In the context of Semantic Web, one of the most important issues 
related to the class-membership prediction task (through inductive models) on 
ontological knowledge bases concerns the imbalance of the training examples 
distribution, mostly due to the heterogeneous nature and the incompleteness of 
the knowledge bases. An ensemble learning approach has been proposed to cope 
with this problem. However, the majority voting procedure, exploited for decid- 
ing the membership, does not consider explicitly the uncertainty and the con- 
flict among the classifiers of an ensemble model. Moving from this observation, 
we propose to integrate the Dempster-Shafer (DS) theory with ensemble learn- 
ing. Specifically, we propose an algorithm for learning Evidential Terminological 
Random Forest models, an extension of Terminological Random Forests along 
with the DS theory. An empirical evaluation showed that: i) the resulting models 
performs better for datasets with a lot of positive and negative examples and have 
a less conservative behavior than the voting-based forests; ii) the new extension 
decreases the variance of the results 



1 Introduction 

In the context of Semantic Web (SW), ontologies and the ability to perform reasoning 
on them, via deductive methods, play a key role. However, standards inference mecha- 
nisms have also shown their limitations due to the incompleteness of ontological knowl- 
edge bases deriving from the Open World Assumption (OWA). In order to overcome 
this problem, alternative forms of reasoning, such as inductive reasoning , have been 
adopted to perform various tasks such as concept retrieval and query answering [1,2], 
These tasks have been cast as a classification problem, consisting in deciding the class- 
membership of an individual with respect to a query concept, to be solved through 
inductive learning methods that exploit statistical regularities in a knowledge base. The 
resulting models can be directly applied to the knowledge base or mixed with deduc- 
tive reasoning capabilities [3], Although the application of these methods has shown 
interesting results and the ability to induce assertional knowledge that is not logically 
derivable, these methods have also revealed some problems due to the aforementioned 
incompleteness. In general, the individuals that are positive and negative instances for 
a given concept may not be equally distributed. This skewness may be stronger when 
considering individuals whose membership cannot be assessed because of the OWA. 
This class-imbalance setting may affect the model, resulting with poor performances. 




Various methods have been devised for tackling the problem, spanning from sam- 
pling methods to ensemble learning approaches [4]. Concerning the specific task of 
instance classification for inductive query answering on SW knowledge bases, we in- 
vestigated on the usage of ensemble methods [5], where the resulting model is built 
by training a certain number of classifiers, called weak learners , and the predictions 
returned by each weak learner are combined by a rule standing for the meta-learner. 
Specifically, we proposed an algorithm for inducing Terminological Random Forests 
(TRFs) [5], an ensemble of Terminological Decision Trees (TDTs) [6]. The method 
extends Random Forests and First Order Random Forests [7, 8] to the case of DL rep- 
resentation languages. When these models are employed, the membership for a test 
individual is decided according to a majority vote rule (although various strategies for 
combining predictions have been proposed [9-11]): each classifier returning a vote in 
favor of a class equally contributes to the final decision. In this way, some aspects are 
not considered explicitly, such as the uncertainty about the class label assignment and 
the disagreement that may exist among weak learners. The latter plays a crucial role for 
the performance of ensemble models [12]. In the specific case of TRFs, we noted that 
most misclassifications were related to those situations in which votes are distributed 
evenly with respect to the admissible labels. 

A weighted voting procedure may be an alternative strategy to mitigate the prob- 
lem, but it requires a criterion for setting the weights. In this sense, introducing a meta- 
learner which manipulates soft predictions of each classifier (i.e. a prediction with a con- 
fidence measure for each class value) rather than hard predictions (where a class value 
is returned) may be a solution. For TRFs, this can be done by considering the extension 
of TDT models based on the Dempster-Shafer Theory (DS) [13], which provides an ex- 
plicit representation of ignorance and uncertainty (differently from the original version 
proposed in [6]). In machine learning, resorting to the DS operators is a well-known 
solution [14]. Most of the existing ensemble combination methods resort to a solu- 
tion based on decision templates , which are obtained by organizing, for each classifier 
against each class, a mean vector (called reference vector ). When these methods are 
employed, predictions are typically made by computing the similarity value between a 
decision profile of an unknown instance with the decision templates. Other approaches 
that does not require the computation of these matrices have been proposed [ 14]. How- 
ever, all the methods consider a propositional representation. Additionally, none of them 
has been employed for predicting assertions on ontological knowledge bases. 

The main contribution of the paper concerns the definition of a framework for the in- 
duction of Evidential Terminological Random Forests for ontological knowledge bases. 
This is an ensemble learning approach that employs Evidential TDTs (ETDTs) [13] 
and does not require the computation of decision templates, similarly to [14]. After the 
induction of the forest, a new individual is classified by combining, by means of the 
Dempster’s rule [15], the available evidence on the membership coming from each tree. 

The remainder of the paper is organized as follows: the next section recalls the ba- 
sics of the Dempster-Shafer Theory; Sect. 3 presents the novel framework for evidential 
terminological random forests, while in Sect. 4, a preliminary empirical evaluation is 
described. Sect. 5 draws conclusions and illustrate perspectives for further develop- 
ments. 




2 Basics on The Dempster-Shafer Theory 



The Dempster-Shafer Theory (DS) is basically an extension of the Bayesian subjective 
probability. In the DS, the frame of discernment is a set of exhaustive and mutually 
exclusive hypotheses fl = {oJ[. 0 J 2 , ■ • • , u > n } about a domain. For instance, the frame 
of discernment for a classification problem could be the set of all admissible class val- 
ues. Moving from this set, it is possible to define a Basic Belief Assignment (BBA) as 
follows: 

Definition 1 (Basic Belief Assignment). Given a frame of discernment fl = {u>i,u> 2 , • • • , w„}. 
A Basic Belief Assignment (BBA) is a function that defines a mapping m : — » [0, 1] 

such that: 

m(A) = 1 (1) 

Ae 2° 

Given a piece of evidence, the value of a BBA m for a set A expresses a measure of 
belief exactly committed to A. This means that the value m(A) does imply no further 
claims about any of its subsets. This means that when A = fl, a case of total ignorance 
occurs. Each element A £ 2 n for which m(A) > 0 is said to be a focal element for 
m. The function m can be used to define other functions, such as the belief and the 
plausibility function. 

Definition 2 (Belief Function and Plausibility Function). For a set A C fl, 

in A, denoted Bel (A), represents a measure of the total belief committed to A 
available evidence. 

WA, B £ 2° Bel(A) = ^ m(B ) 

BCA 

The plausibility of A, denoted Pl(A), represents the amount of belief that could be 
placed in A, if further information became available. 

\/A, B £ 2 n Pl(A) = m ( B ) (3) 

BnvM0 

It can be proved that, knowing just one among m, Bel and PI allows to derive all the 
other functions [16]. 

In the DS, various measures for quantifying the amount of uncertainty have been 
proposed, e.g. the non-specificity measure [17]. The latter can be regarded as a measure 
for representing the imprecision of a BBA function. This measure can be computed by 
the following equation: 

Ns= ^2 m(A) log(|A|) (4) 

Ae2° 

It is easy to note that the non-specificity value is higher when the focal elements are 
larger subsets of fl, for the elements of which no further claims can be made. 

One of the most important aspects related to the DS is the availability of various op- 
erators for pooling evidence from different sources of information. One of them, called 
Dempster’s rule , aggregates independent evidences defined within the same frame of 



the belief 
given the 

( 2 ) 




discernment. Let mi and m 2 be two BBAs. The new BBA obtained by combining m-\ 
and m 2 using the rule of combination, m 12 , can be expressed by the orthogonal sum of 
mi and m 2 . Generally, the normalized version of the rule is used: 

VA,B,CC1 1 mi 2 (A) = mi © m 2 = — - — m.AB)m. 2 (C) (5) 

1 — c z -—' 

BnC=A 

where the conflict c can be computed as: c = Xmnc=0 m i (S)m 2 (C) 

In the DS, the independence of the available evidences is typically a strong con- 
straint that can be relaxed by using further combinations rules, e.g. the Dubois-Prade’s 
rule [18]. 

m 12 (A) = ^ mi(B)m 2 (C) (6) 

BUC—A 

Differently from the Dempster’s rule, the latter considers the union between two sets 
of hypothesis rather than their intersection. As a result, the conflict between sources of 
information does not exists. 

3 Evidence-based Ensemble Learning for Description Logic 

The TDT (and RF) learning approach is now recalled before introducing the method for 
the induction of an evidence-based versions of these classification models. 

3.1 Class-Imbalance and Terminological Random Forests 

In machine learning, the class-imbalance problem concerns the skewness of training 
data distribution. Considering a multilabel setting, where the number of class label is 
greater than 3, the problem usually occurs when the number of training instances be- 
longing to the a particular class (the majority class) overwhelms the number of those be- 
longing to the other classes (which represents the minority class). In order to tackle the 
problem, most common strategies based on sampling strategy have been proposed [19]. 
One of the simplest method is an under-sampling strategy that randomly discards in- 
stances belonging to the majority class in order to re-balance the dataset. However, this 
method causes a loss of information due to the possible discarding of useful exam- 
ples required for inducing a quite predictive model. A Terminological Random Forest 
(TRF) is an ensemble model trained through a procedure that combines a random under- 
sampling strategy with the ensemble learning induction [5], The main purpose for the 
induction of these models is to mitigate the loss of information mentioned above in the 
context of SW knowledge bases. A TRF is basically made up of a certain number of 
Terminological Decision Trees (TDTs)[6], where each of them is built by considering 
a (quasi-)balanced dataset. The ensemble model assigns the final class for a new indi- 
vidual by appealing to a majority vote procedure. Therefore each TDT returns an hard 
prediction : this means that each tree contributes equally to the decision concerning the 
class label, regardless its confidence about predictions. In order to consider also this 
kind of information and tackling sundry problems as the uncertainty about the class as- 
signment (i.e. when the confidence about either a class or another one is approximately 





Fig. 1 . A simple example of ETDT: each nodes contains a DL concept description and a BBA 
obtained by counting the instances that reach the node during the training phase 



equals) and the disagreement between classifiers that may lead to misclassifications 
[5], we need to resort to other models for the ensemble approach, such as Evidential 
Terminological Decision Trees [13]. 

3.2 Evidential Terminological Decision Trees 

In [13], it has been shown how the class-membership prediction task can be tackled 
by inducing Evidential Terminological Decision Trees (ETDTs), an extension of the 
TDTs [6] based on evidential reasoning. ETDTs are defined in a similar way of TDTs. 
However, unlike TDTs, each node contains a couple (D, to), where I? is a DL concept 
description and m is BBA concerning the membership w.r.t. D, rather than the sole 
concept description. Practically, to learn an ETDT model, a set of concept descriptions 
is generated from the current node by resorting to the refinement operator, denoted 
by p. For each concept, a BBA is also computed by considering the positive, negative 
and uncertain instances w.r.t. the generated concept. Then the best description (and the 
corresponding BBA) is selected, i.e. the one having the smallest non-specificity measure 
value w.r.t. the previous level. In other words, this means that the description is the one 
having the most definite membership. 

Fig. 1 reports a simple example of ETDT used for predicting whether a car is to be 
sent back to the factory (Send Back) or can be repaired. We can observe that the root 
concept BhasPart.T is progressively specialized. Additionally, the concepts installed 
into the intermediate nodes are characterized by a decreasingly non specificity measure 
value. 

3.3 Evidential Terminological Random Forests 

An Evidential Terminological Random Forest (ETRF) is an ensemble of ETDTs. We 
will focus on the procedures for producing an ETRF and for predicting class-membership 






of input individuals exploiting an ETRF. Moving from the formulation of the concept 
learning problem proposed in [5], we will use the label set £ = {— 1, +1} as frame of 
discernement of the problem. The labels in C are usually used to denote, respectively, 
the cases of positive and negative membership w.r.t. a target concept C. However, in or- 
der to represent the uncertain-membership related to the Open World Assumption, we 
will employ the label set C! = 2 C \ {0} and the singletons {+1} and {—1} to denote 
the positive and negative membership w.r.t. C while the case of uncertain-membership 
will be labeled by C = {—1, Til- 



Growing ETRFs. Alg. 1 describes the procedure for producing an ETRF. In order to 
do this, the target concept C, a training set Tr C lnd(_4) and the desired number of 
trees n are required. Tr may contain not only positive and negative examples but also 
of instances with uncertain membership w.r.t. C. According to a bagging approach, the 
training individuals are sampled with replacement in order to obtain n subsets Dj C Tr, 
with i = 1, ... ,n. In order to obtain D t s, it is possible to apply various sampling 
strategies although, in this work, we followed the approach proposed in [5], Firstly, the 
initial data distribution is considered by adopting a stratified sampling w.r.t. the class- 
membership values in order to represent instances of the minority class. In the second 
phase, undersampling can be performed on the training set in order to obtain (quasi- 
balanced Dj sets (i.e. with a class imbalance that will not affect much the training 
process). This means that if the majority class is the negative one, the exceeding part 
of the counterexamples is randomly discarded. In the dual case, positive instances are 
removed. In addition, the sampling procedure removes also all the uncertain instances. 
In Alg. 1, the procedure that returns the sets D,; implementing this strategy is Bal- 
ANCEDBOOTSTRAPSAMPLE. For each Dj, an ETDT T is built by means of a recursive 
strategy, as described in [13] which is implemented by the procedure INDUCeETDT). 
It distinguishes various cases. The first one uses prior probability (estimate) to cope 
with the lack of examples (|Ps| = 0 and |Ns| = 0). The second one sets the class la- 
bel for a leaf node if it is sufficiently pure, i.e. no positive (resp. negative) example is 
found while most examples are negative (resp. positive). This purity condition is eval- 
uated by considering the BBA m given as input for the algorithm (m({— 1} ~ 0 and 
to({+1|) > 9 to({+ 1| = 0 and m({— 1}) > 9). The values of a BBA function for 
the membership values are obtained by computing the number of positive, negative and 
uncertain-membership instances w.r.t. the current concept. Finally, the third (recursive) 
case concerns the availability of both negative and positive examples. In this case, the 
current concept description D has to be specialized by means of an operator exploring 
the search space of downward refinements of D. Following the approach described in 
[5, 8], the refinement step produce a set of candidate specializations p(D) and a subset 
of them, namely RS, is then randomly selected (via function RandomSelection) 
by setting its cardinality according to the value returned by a function / applied to 
the cardinality of the set of specializations returned by the refinement operator (e.g. 
\f\p{D)\). A BBA m! is then built for each candidate E £ RS. Again, the function can 
be obtained by counting the number of positive, negative and uncertain-membership 
instances). Then the best pair ( E* ,m *) £ S according to the non-specificity measure 
employed in [13] is determined by the SELECTBESTCANDIDATE procedure and finally 




Algorithm 1 The routines for inducing an ETRF 

1 const: 6: threshold 

2 function lNDUCEETRF(Tr : training set; C : concept; n £ N): TRF 

3 begin 

4 Pr <— ESTIMATEPRIORS(Tr, C ): { C prior membership probabability estimates} 

5 F <- 0 

6 for i 4— 1 to n 

7 D; <- BALANCEDBOOTSTRAPS AMPLE (Tr) 

8 let Di = (Ps, Ns, Us) 

9 Ti «— INDUCEETDTree(D;, C, Pr); 

10 F <— F U {Ti} 

return F 

12 end 

13 

14 function InduceETDTree((Ps, Ns, U s) : training set; C:concept; m: BBA, Pr: priors) 

is begin 

16 

17 T 4— new ETDT 

18 if \Ps\ = 0 and \Ns\ = 0 then 

19 begin 

if Pr(+1) > Pr(— 1) then {pre— defined constants wrt the whole training set} 

21 T.root <— (C, m) 

22 else 

23 T.root <— (~>C, m) 

24 return T 

25 end 

26 if (m({ — 1} ~ 0) and (m({ + l}) > 6) then 

27 begin 

28 T.root <— (C, m) 

29 return T 

30 end 

31 if (ra({+l} ~ 0) and (m({ — 1}) > 6) then 

32 begin 

33 T roo , <- {-.C, m) 

34 return T 

35 end 

36 RS <— RandomSelection(p(D)) {random selection of specializations} 

37 S <— 0 

38 for E G RS {assignBBA for each candidate} 

39 m' <- COMPUTEBBA(P, (Ps, Ns, Us)) 

40 S <- S\J {(E,m')} 

41 

42 < E*,m *) <— selectBestCandidate(5) 

43 <(p', N 1 , U ! ), (P r , N r , U r )> <- SPLIT(B*, (Ps, Ns, Us)) 

44 T.root 4— (E* , m*) 

45 T.left 4- INDUCEETDT((P ! , N 1 , U ! ), E* , Pr) 

46 T. right 4— INDUCEETDT((P r ', N r , U r ), E* , Pr) 

47 return T 

48 end 




Algorithm 2 Class-membership prediction 

function CLASSIFYBYTRF(a : individual; F : TRF; C : target concept) : C, 

begin 

M[] <— new array 

for each T 6 F 

M[T] <- CLASSIFY(a, T) 

fh <— e„ ieM m {pooling according to a combination rule} 

for each I G 2 £ {class assignement} 

Compute Bel(l) from m 

if (|BeZ({ — 1» - Bel({ + l})| > e ) then 
return arg Bel(l) 

else 

return C 
end 

function CLASSIFY (a, T): fh 

begin 

L 4— FINDLEAVES(a, T) {listofBBA} 
fh i- © TOgi m 

return fh 
end 



installed in the current node. Specifically, the procedure tries to find the pair (. E*,m *) 
having the smallest non-specificity measure value. After the assessment of the best pair 
E* , the individuals are partitioned by the procedure SPLIT for the left or right branch 
according to the result of the instance-check w.r.t. E*, maintaining the same group 
(P , / r ,l\f A or U ,/r ). Note that a training example a is replicated in both children in 
case both 1C E*(a) and 1C -iE*(a). The divide-and-conquer strategy is applied 
recursively until the instances routed to a node satisfy one of the stopping conditions 
discussed above. 



Prediction. After an ETRF is produced, predictions can be made relying on the re- 
sulting classification model. The related procedure sketched in Alg. 2 works as follows. 
Given the individual to be classified, for each tree T, of the forest, the procedure CLAS- 
SIFY returns a BBA assigned to the leaves reached from the root in a path down the 
tree. Specifically, the algorithm traverses recursively the ETDT by performing an in- 
stance check w.r.t. the concept contained in each node that is reached: let a C lnd(_4) 
and D the concept installed in the current node, if K, \= D(a) (resp. JC |= ~^D(a)) the 
left (resp. right) branch is followed. If neither K. |^= D(a) nor 1C \f=- —iD(a) is verified, 
both branches are followed. After the exploration of a single ETDT, the list L may con- 
tain several BBAs. In this case, BBAs are pooled according to a combination rule as the 
Dubois-Prade’s one [13]. The function CLASSIFY returns the combined BBA according 
to this rule (denoted by the symbol 0). After polling all trees, a set of BBAs deriving 
from the previous phase are exploited to decide the class label to the test individual a. 
Function CLASSIFYByTRF takes an individual a and a forest F. Then, the algorithm 
iterates on the forest trees collecting the BBAs via function CLASSIFY. Then, the BBAs 
are pooled according to a further combination rule, which can be different from the one 




Table 1 . Ontologies employed in the experiments 



Ontology 


DL Lang. 


#Concepts #Roles individuals 


BCO 


ACCUOT{V) 


196 


22 


112 


BioPax 


ACCIT(D) 


74 


70 


323 


NTN 


SUTF{D) 


47 


27 


676 


HD 


ACCXT (D) 


1498 


10 


639 



employed during the exploration of a single ETDT. Additionally, this combination rule 
should be also an associative operator [15]. In this way, the result should not be affected 
by the pooling order of the BBAs. In our experiments we combined these BBAs via 
Dempster’s rules (denoted by the symbol 0 in the function CLASSIFYByTRF ). By 
using this rule, the disagreement between classifiers, which corresponds to the conflict 
exploited as normalization factor, is explicitly considered by the meta-learner. The final 
decision is then made according to the belief function value computed from the pooled 
BBAs fra . In this case, we aim to select the Z £ 2 C which maximizes the value of the 
function. However, in order to cope with the monotonicity of belief function which can 
lead easily to return an unknown-membership as a final prediction, the meta-learner 
must compare the value for the positive and negative class label and it assign the un- 
known membership if their values are approximately equal. This is made by comparing 
the difference between belief function values w.r.t. a threshold e. 



4 Preliminary Experiments 

The experimental evaluation aims at evaluating the effectiveness of the classification 
based on the ETRF models 1 and the improvement in terms of prediction w.r.t. TRFs. 
We provide the details of the experimental setup and present and discuss the outcomes. 

4.1 Setup 

Various Web ontologies have been considered in the experiments (see Tab. 1). They are 
available on TONES repository 2 . For each ontology of TONES, 15 query concepts have 
been randomly generated by combining (using the conjunction and disjunction opera- 
tors or universal and existential restriction) 2 through 8 (primitive or defined) concepts 
of the ontology. 

As in previous works [5, 13], because of the limited population of the considered 
ontologies, all the individuals occurring in each ontology were employed as (training 
or test) examples. 

A 10-fold cross validation design of the experiments was adopted so that the final 
results are averaged for each of the considered indices (see below). We compared our 
extensions with other tree-based classifiers: TDTs [6], TRFs [5] and ETDTs [13]. 

'The source code will be available at: https://github.com/Giuseppe-Rizzo/ 

SWMLAlgorithms 

“ http : //www . inf .unibz . it /tones/ index .php 




Table 2. Results of experiments with TDTs and ETDTs models 



Ontology index 


TDT 


ETDTs 


M% 


80.44 ± 11.01 


90.31 ± 14.79 


bco 


07.56 ± 08.08 


01.86 ± 02.61 


o% 


05.04 ± 04.28 


00.00 ± 00.00 


1% 


06.96 ± 05.97 


07.83 ± 15.35 


M% 


66.63 ± 14.60 


87.00 ± 07.15 


„ C% 

Riopay 


31.03 ± 12.95 


11.57 ± 02.62 


DlUrAX Q% 


00.39 ± 00.61 


00.00 ± 00.00 


~1%~ 


01.95 ± 07.13 


01.43 ± 08.32 


M% 


68.85 ± 13.23 


23.87 ± 26.18 


ntn c% 


00.37 ± 00.30 


00.00 ± 00.00 


o% 


09.51 ± 07.06 


00.00 ± 00.00 


1% 


21.27 ± 08.73 


75.13 ± 26.18 


M% 


58.31 ± 14.06 


t- 

o 

-H 

OV 

'O 

o 


HD -£%_ 


00.44 ± 00.47 


00.07 ± 00.17 


o% 


05.51 ± 01.81 


00.00 ± 00.00 


~1%~ 


35.74 ± 15.90 


89.24 ± 01.46 



In order to learn each ETDTs by considering a balanced set of examples, a stratified 
sampling was required (see Sect. 3). Three stratified sampling rates related to the D,s 
were set in our experiments, namely 50%, 70% and 80%. 

Finally, forests with an increasing number of trees were induced, namely: 10, 20 
and 30. For each tree in a forest, the number of randomly selected candidates was de- 
termined as the square root of candidate refinements: \f\ p(-) |. We employed these 
settings for training both ETRFs and TRFs. As in previous works [6, 5, 13], to compare 
the predictions made using RFs against the ground truth assessed by a reasoner, the 
following indices were computed: 

- match rate (M%), i.e. test individuals for which the inductive model and a reasoner 
agree on the membership (both {+1}, { — 1}. or { — 1, +1}); 

- commission rate (C%) i.e. test cases where the determined memberships are oppo- 
site (i.e. {+1} vs. {—1} or viceversa); 

- omission rate (0%), i.e. test cases for which the inductive method cannot determine 
a definite membership while the reasoner can ( {—1, +1} vs. {+1} or {—1}); 

- induction rate (1%). i.e. test cases where the inductive method can predict a definite 
membership while the reasoner cannot assess it ({+1} or { — 1} vs. {—1, +1}). 



4.2 Results 

As regards the distribution of the instances w.r.t. the target concepts, we observed that 
negative instances outnumber the positive ones in BCO and Human Disease (HD). In 
the case of BCO this occurred for all concepts but one with a ratio between positive 
and negative instances of 1 : 20. In the case of HD this kind of imbalance occurred for 
all the queries. Moreover, in the case of HD the number of instances with an uncertain- 
membership is very large (about 90%). On the other hand, in the case of NTN, we noted 
the predominance of positive instances: for most concepts the ratio between positive 
and negative instances was 12 : 1 and a lot of uncertain-membership instances were 




Table 3. Comparison between TRFs and ETRF with sampling rate of 50 % 



















Sampling 


rate 50 


o 












ZD 


Ontology 


index 


TRF | 


ETRF ] 






10 


trees 


20 


trees 


30 


trees 


10 


trees 


20 


trees 


30 


trees | 




M % 


86.27 


± 


15.79 


86.24 


± 


15.94 


86.26 


+ 


15.84 


91.31 


± 


06.35 


91.31 


± 


06.35 


91.31 


± 


06.35 


BCO 


C% 


02.47 


± 


03.70 


02.43 


± 


03.70 


02.84 


± 


03.70 


02.91 


± 


02.45 


02.91 


± 


02.45 


02.91 


± 


02.45 


o% 


01.90 


± 


07.30 


01.97 


± 


07.55 


01.92 


± 


07.37 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


09.36 


± 


13.96 


09.36 


± 


13.96 


09.36 


± 


13.96 


05.88 


± 


06.49 


05.88 




06.49 


05.88 


± 


06.49 




M% 


75.30 


±r 


16.23 


75.30 


■± 


16.23 


75.30 


~± 


16.23 


96.92 


± 


08.07 


96.79 


±r 


08.15 


96.55 


■± 


08.15 


BlOPAX 


C% 


18.74 


± 


17.80 


18.74 




17.80 


18.74 




17.80 


00.79 


± 


01.22 


00.91 


± 


01.74 


00.77 


± 


01.74 


o% 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


01.97 


± 


07.16 


01.97 


± 


07.16 


01.97 


± 


07.16 


02.29 


± 


08.13 


02.30 


± 


08.15 


02.30 


± 


08.15 




M% 


83.41 


± 


07.85 


83.42 


~± 


07.85 


83.42 




07.85 


05.38 


± 


07.38 


05.38 


± 


07.38 


05.38 


± 


07.38 


NTN 


C% 


00.02 


± 


00.04 


00.02 


± 


00.04 


00.02 


± 


00.04 


06.58 


± 


07.51 


06.58 


± 


07.51 


06.58 


± 


07.51 


o% 


13.40 


± 


10.17 


13.40 


± 


10.17 


13.40 


± 


10.17 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


03.17 


TT 


04.65 


03.16 


± 


04.65 


03.16 


± 


04.65 


88.05 


± 


08.50 


88.05 


± 


08.50 


88.05 


± 


08.50 




M% 


68.00 


±T 


16.98 


68.00 


± 


16.99 


67.98 


■± 


16.99 


10.29 


± 


00.00 


10.29 




00.01 


10.29 


± 


00.02 


HD 


C% 


00.02 


± 


00.05 


00.02 




00.05 


00.02 


± 


00.05 


00.26 


± 


00.26 


00.26 


± 


00.27 


00.26 


± 


00.28 


0% 


06.38 


± 


02.03 


06.38 


± 


02.03 


06.38 


± 


02.03 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


25.59 


± 


18.98 


25.59 


± 


18.98 


25.59 


± 


18.98 


89.24 


± 


00.26 


89.24 


± 


00.26 


89.24 


± 


00.26 



found (again, over 90%). A weaker imbalance could be noted with BioPax. For most 
query concepts the ratio between positive and negative instances was 1 : 5. In addition, 
for most query concepts, uncertain-membership instances lacked. This kind of instances 
were available only for 2 queries. The class distribution was balanced for three concepts 
only. 

Tables 2-5 report the results of this empirical evaluation. On the other hand. Tab. 6 
shows the differences between indexes for TRFs and ETRFs. In general, we can observe 
how ensemble methods perform better or, in the worst cases, have the same performance 
of a single classifiers approach for most ontologies. For example, when we compare 
ETRFs w.r.t. ETDTs, a significant improvement was obtained for BlOPAX (the match 
rate was around 96% for ETRFs and 87% for ETDTs). For BCO, there was a more 
limited improvement: it was only around 1.31% and it was likely due to the number of 
examples available in BCO. In this case, when ETRFs model were induced there was 
a larger overlap between the ETDTs in the forests and the sole ETDT model employed 
in the single-classifier approach, i.e. the models were very similar to each other. 

As regards the comparison between ETRFs and TRFs model, an improvement of 
match rate and a subsequent decrease of induction rate was observed for Bco. This 
improvement was around 6% for match rate while it was of 3% for the induction rate 
when a sampling rate of 50% was employed. The improvement of match rate was larger 
when the sampling rate of 70 % and 80 % were employed. In this case, the addition of 
further instances lead to that the improvement of the predictiveness of the ETRFs. The 
ensemble of models proposed in this paper showed a more conservative behavior w.r.t. 
the original version. It can be noted that the increase of match rate was mainly due to 
uncertain-membership instances that were not classified as induction cases, as a result of 
the values of belief functions employed for making decisions. Another cause is related 
to the lack of omission cases. In this case, the procedure for forcing the answer leads to 
decide in favor of the correct class-membership value. Besides the value of commission 
rate did not change in a significant way. The proposed extension is also more stable in 




Table 4. Comparison between TRFs and ETRF with sampling rate of 70 % 
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Ontology 


index 


TRF | 


ETRF ] 






10 


trees 


20 


trees 


30 


trees 


10 


trees 


20 


trees 


30 


trees | 




M% 


84.12 


± 


18.27 


85.70 


± 


16.98 


85.52 


+ 


17.09 


91.31 


± 


06.35 


91.31 


± 


06.35 


91.31 


± 


06.35 


BCO 


c% 


02.16 


± 


03.09 


02.32 


± 


03.39 


02.30 


± 


03.38 


02.91 


■± 


02.45 


02.91 


± 


02.45 


02.91 


± 


02.45 


o% 


04.50 


± 


12.59 


02.65 


± 


09.93 


02.86 


± 


10.04 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


09.23 


± 


13.99 


09.33 


=E 


13.97 


09.31 




13.91 


05.88 


± 


06.49 


05.88 


± 


06.49 


05.88 


± 


06.49 




M% 


75.30 


± 


16.23 


75.30 


± 


16.23 


75.30 


~± 


16.23 


96.65 


■± 


08.05 


95.98 


±r 


08.13 


96.55 


■± 


08.15 


BlOPAX 


c% 


18.74 


± 


17.80 


18.74 




17.80 


18.74 




17.80 


01.07 




01.67 


01.71 


± 


02.50 


00.77 




01.74 


o% 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


01.97 


± 


07.16 


01.97 


± 


07.16 


01.97 


± 


07.16 


02.28 


± 


08.13 


02.31 


± 


08.17 


02.30 


± 


08.15 




M% 


83.42 


± 


07.85 


83.42 


~± 


07.85 


83.42 




07.85 


05.50 


~± 


07.28 


05.50 


± 


07.28 


05.50 


~± 


07.28 


NTN 


C% 


00.02 


± 


00.04 


00.02 


± 


00.04 


00.02 


± 


00.04 


06.52 


± 


07.54 


06.52 


± 


07.54 


06.52 


± 


07.54 


o% 


13.40 


± 


10.17 


13.40 


± 


10.17 


13.40 


± 


10.17 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


03.16 


± 


04.65 


03.16 


± 


04.65 


03.16 


± 


04.65 


87.99 




08.84 


87.99 




08.84 


87.99 


± 


08.84 




M% 


68.00 


± 


16.98 


68.00 


± 


16.99 


67.98 


■± 


16.99 


10.29 


± 


00.00 


10.29 


~± 


00.01 


10.29 


± 


00.02 


HD 


C% 


00.02 


± 


00.05 


00.02 




00.05 


00.02 


± 


00.05 


00.26 


± 


00.26 


00.26 


± 


00.27 


00.26 


± 


00.28 


o% 


06.38 


± 


02.03 


06.38 


± 


02.03 


06.38 


± 


02.03 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


25.59 


± 


18.98 


25.59 


± 


18.98 


25.59 


± 


18.98 


89.24 


± 


00.26 


89.24 


± 


00.26 


89.24 


± 


00.26 



Table 5. Comparison between TRFs and ETRF with sampling rate of 80 % 
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± 
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± 
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79.33 
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06.35 
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± 
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C% 


01.45 


± 
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± 
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± 
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± 
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± 
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± 
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0% 
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± 


22.19 
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± 


15.04 


10.38 


± 


19.28 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


08.47 
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± 
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± 
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05.88 
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M% 
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± 


16.23 


75.30 


± 


16.23 


75.30 


± 


16.23 
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± 


07.95 
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± 
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96.55 


± 


08.15 


BIOPAX 


C% 
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17.80 


18.74 


± 


17.80 


18.74 


± 


17.80 


02.24 


± 


02.63 


03.40 


± 


05.54 


00.77 


± 


01.74 


o% 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


01.97 


± 
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01.97 


± 
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01.97 


± 
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02.29 


± 


08.11 


02.31 


± 


08.17 


02.30 


± 


08.15 




M% 


83.41 


± 


07.85 


83.42 


± 


07.85 


83.42 


± 


07.85 


05.50 


± 


07.28 


05.50 


± 


07.28 


05.50 


± 


07.28 


NTN 


C% 


00.02 


± 


00.04 


00.02 


± 


00.04 


00.02 


± 


00.04 


06.52 


± 
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06.52 


± 


07.54 


06.52 


± 


07.54 


o% 


13.40 


± 


10.17 


13.40 


± 


10.17 


13.40 


± 
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00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


03.17 
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03.16 
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03.16 


± 


04.65 
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08.84 


87.99 


± 


08.84 


87.99 


± 


08.84 




M% 


68.00 


± 


16.98 


68.00 


± 


16.99 
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± 


16.99 
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± 


00.00 


10.29 


± 


00.01 


10.29 


± 


00.02 


HD 


C% 


00.02 




00.05 


00.02 


± 


00.05 
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± 
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± 
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00.26 


± 
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± 


00.28 


o% 


06.38 


± 


02.03 


06.38 


± 


02.03 


06.38 


± 


02.03 


00.00 


± 


00.00 


00.00 


± 


00.00 


00.00 


± 


00.00 




1% 


25.59 


± 


18.98 


25.59 


± 


18.98 


25.59 


± 


18.98 


89.24 


± 


00.26 


89.24 


± 


00.26 


89.24 


± 


00.26 



terms of standard deviation: for ETRFs, this value is lower than the one obtained for 
TRFs. 

With BioPax, we observed again the increase of the match and a significant de- 
crease of commission rate. Also the induction rate was larger with ETRFs than with 
TRFs, likely due to the procedure for forcing the answer. As regards the experiments on 
HD and NTN ontology, we can observe, differently from the original version of TRFs, 
how the induction rate was very high when ETRFs were employed. For the latter case, 
this result was mainly due to the original data distribution that showed an overwhelming 
of uncertain instances. As previously mentioned, they approximately represented about 
50% of the total number of instances in the ABox of HD and about 90% for NTN. 
TRFs showed a conservative behavior by returning an unknown membership (due to 
uncertain results of the intermediate tests during the exploration of trees [5]) which 




Table 6. Differences between the results for TRFs and ETRFs model. The symbol • is used to 
denote that a positive or negative difference that is in favor of ETRFs, while the symbol o is used 
to denote a positive or negative difference that is in favor of TRFs 



Ontology Index 


Sampling rate 50 % 


Sampling rate 70 % 


Sampling rate 80 % 


10 trees 20 trees 30 trees 


10 trees 20 trees 30 trees 


10 trees 20 trees 30 trees 


AM% 


+05.04 • +05.07 • +05.05 • 


+07.19 • +05.61 • +05.79 • 


+15.74 • +10.04 • +11.98* 


~15C% 


+00.44 o +00.48 o +00.07 o 


+00.75 o +00.59 o +00.61 o 


+01.46 o +01.02 o +01.27o 


BLO AO% 


-01.90 • -01.97 • -01.92 • 


-04.50 • -02.65 • -02.86 • 


-13.51 * -8.05* -10.38* 


Al% 


-03.48 • -03.48 • -03.48 • 


-03.35 • -03.45 • -03.43 • 


-02.59 * -02.77* -03.48* 


AM% 


+21.62 • +21.49 • +21.25 • 


+21.35 • +20.68 • +21.25 • 


+20.17 • +18.99 • +21.25 • 


Biopax 


-17.95 • -17.83 • -17.97 • 


-17.67 • -17.03 • -17.97 • 


-16.50 * -15.34* -17.97* 


AO% 


+00.00 • +00.00 • +00.00 • 


+00.00 • +00.00 • +00.00 • 


+00.00 • +00.00 • +00.00 • 


Al% 


+00.32 o +00.33 o +00.33 o 


+00.31 o +00.34 o +00.33 o 


+00.32 o +00.34 o +00.33 o 


A M% 


-78.03 0 -78.04o -78.04 o 


-77.92o -77.92o -77.92 o 


-77.91o -77.92o -77.92 o 


~XC% 


+06.56 o +06.56 o +06.56 o 


+06.50 o +06.50 o +06.50 o 


+06.50 o +06.50 o +06.50 o 


AO% 


-13.40 • -13.40 • -13.40 • 


-13.40 • -13.40 • -13.40 • 


-13.40 * -13.40* -13.40* 


Al% 


+84.88 o +84.89 o +84.89 o 


+84.83 o +84.83 o +84.83 o 


+84.82 o +84.83 o +84.83 o 


AM% 


-57.71o -57.71o -57.69 o 


-57.71o -57.71o -57.69 o 


-57.71o -57.71o -57.69 o 


HD 


+00.24 o +00.24 o +00.24 o 


+00.24 o +00.24 o +00.24 o 


+00.24 o +00.24 o +00.24 o 


AO% 


-06.38 • -06.38 • -06.38 • 


-06.38 • -06.38 • -06.38 • 


-06.38 * -06.38* -06.38* 


Al% 


+63.65 o +63.65 o +63.65 o 


+63.65 o +63.65 o +63.65 o 


+63.65 o +63.65 o +63.65 o 



tends to preserve the matches with the gold-standard membership also in case of uncer- 
tain membership. This explains the high match rate observed in the experiments. After 
the induction of ETRFs, the models showed a braver behavior also due to the forcing 
procedure. As a result, it tends to more easily assign a positive or negative membership 
to a test instance leading to the increase of the induction rate, with a value of about 89% 
while omission cases missed. Induction cases represent new non-derivable knowledge 
that can be potentially useful for ontology completion, their larger number suggest that 
the result may be also due to the existing noise (also due to the employment of the 
entire ABox as dataset). This basically means that most induced assertions may be not 
definitely related to learned concepts, but they cannot considered as real errors like 
commission rate. 

Similarly to our previous experiments proposed in [5], we observed also how the 
generated concept descriptions that were installed as node for each ETDT do not im- 
prove the quality of the splittings, similarly to the case of TDTs where the training 
was lead by the information gain criterion. This occurred for all the datasets that were 
considered here. In both cases, most instances were sent along a branch, while a small 
number of them were sent along the other one. This means that small disjuncts problem 
is a common problem both TRFs and ETRFs and neither the information gain nor the 
non-specificity measure can be considered as suitable measures for selecting the best 
concept description that is used to split instances during the training phase. A further 
remark concerns the predictiveness of the proposed method w.r.t. both the sampling 
methods and the number of trees in a forest. Also for ETRFs, the performance did not 
change significantly when a larger number of trees was set or when the algorithm resort 
to a larger stratified sampling rate. While in the former case the results are likely due 
to a weak diversification between ETDTs, in the latter case, the result was likely due to 
the availability of examples whose employment did not change the quality of splittings 
generated during the growth process. For ETRFs, similarly to TRF models, the refine- 
ment operator is still a bottleneck for learning phase: execution times spanned from few 




minutes to almost 10 hours as the experiments proposed in [5]. However, when an in- 
termediate test with an uncertain result was encountered, the exploration of alternative 
paths affected the efficiency of the proposed method. 



5 Conclusion and Extensions 

We have proposed an algorithm for inducing Evidential Terminological Random Forests, 
an extension of Terminological Random Forests devised to tackle the class-imbalance 
problem for learning predictive classification models for SW knowledge bases. As the 
original version, the algorithm combines a sampling approach with ensemble learning 
techniques. The resulting models combine predictions that are represented as basic be- 
lief functions rather than votes by exploiting combination rules in the context of the 
Dempster-Shafer Theory for making the final decision. In addition, a preliminary em- 
pirical evaluation with publicly available ontologies has been performed. The experi- 
ments have shown how the new classification model seems to be more predictive than 
the previous ones and it tends to assign a definite membership. Besides, the predictive- 
ness of the model can be sufficiently tolerant to variation of the number of trees and 
the sampling rate. The standard deviation is also lower than the original TRFs. In the 
future, we plan to extend the method along various directions. One regards the choice 
of the refinement operator that may be applied in order to generate more discriminative 
intermediate tests. This plays a crucial role for the quality of the classifiers involved 
in the ensemble model in order to obtain quite predictive weak learners from both ex- 
pressive and shallow ontologies extracted from the Finked Data cloud [20]. In order to 
cope with the latter case, the method could be parallelized in order to employ it as a 
non-standard tool to reason over such datasets. Further ensemble techniques and novel 
rules for combining the answers of the weak learners could be employed. For example, 
weak learners can be induced from subsets of training instances generated by means of 
a procedure based on cross-validation rather than sampling with replacement. Finally, 
further investigations may concern the application of strategies aiming to optimize the 
ensemble, that is an important characteristic of such learning methods [12, 21]. 
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